Central Point
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
Yang, Qize, Yao, Shimin, Chen, Weixuan, Fu, Shenghao, Bai, Detao, Zhao, Jiaxing, Sun, Boyuan, Yin, Bowen, Wei, Xihan, Zhou, Jingren
With the rapid evolution of multimodal large language models, the capacity to deeply understand and interpret human intentions has emerged as a critical capability, which demands detailed and thoughtful reasoning. In recent studies, Reinforcement Learning (RL) has demonstrated potential in enhancing the reasoning capabilities of Large Language Models (LLMs). Nonetheless, the challenges associated with adapting RL to multimodal data and formats remain largely unaddressed. In this paper, we identify two issues in existing multimodal reasoning models: insufficient global context understanding and shortcut problems. Insufficient context understanding can happen when a model misinterprets multimodal context, resulting in incorrect answers. The shortcut problem occurs when the model overlooks crucial clues in multimodal inputs, directly addressing the query without considering the multimodal information. To tackle these issues, we emphasize the necessity for the model to reason with a clear understanding of the global context within multimodal inputs. This global context understanding can effectively prevent the model from overlooking key multimodal cues and ensure a thorough reasoning process. To ensure the accurate interpretation of multimodal context information, we implement a context reward judged by a large language model, alongside format and accuracy rewards. Additionally, to improve complex reasoning capability, we employ the LLM to assess the logical reward, determining whether the reasoning process successfully integrates multimodal information with logical methods. We also introduce a reasoning omni-modal benchmark, IntentBench, aimed at evaluating models in understanding complex human intentions and emotions. Our proposed method demonstrates advanced performance across multiple omni-modal benchmarks compared to other open-source omni-modal models.
Virtual airways heatmaps to optimize point of entry location in lung biopsy planning systems
Gil, Debora, Lloret, Pere, Diez-Ferrer, Marta, Sanchez, Carles
Purpose: We present a virtual model to optimize point of entry (POE) in lung biopsy planning systems. Our model allows to compute the quality of a biopsy sample taken from potential POE, taking into account the margin of error that arises from discrepancies between the orientation in the planning simulation and the actual orientation during the operation. Additionally, the study examines the impact of the characteristics of the lesion. Methods: The quality of the biopsy is given by a heatmap projected onto the skeleton of a patient-specific model of airways. The skeleton provides a 3D representation of airways structure, while the heatmap intensity represents the potential amount of tissue that it could be extracted from each POE. This amount of tissue is determined by the intersection of the lesion with a cone that represents the uncertainty area in the introduction of biopsy instruments. The cone, lesion, and skeleton are modelled as graphical objects that define a 3D scene of the intervention. Results: We have simulated different settings of the intervention scene from a single anatomy extracted from a CT scan and two lesions with regular and irregular shapes. The different scenarios are simulated by systematic rotation of each lesion placed at different distances from airways. Analysis of the heatmaps for the different settings show a strong impact of lesion orientation for irregular shape and the distance for both shapes. Conclusion: The proposed heatmaps help to visually assess the optimal POE and identify whether multiple optimal POEs exist in different zones of the bronchi. They also allow us to model the maximum allowable error in navigation systems and study which variables have the greatest influence on the success of the operation. Additionally, they help determine at what point this influence could potentially jeopardize the operation.
Language-Based Bayesian Optimization Research Assistant (BORA)
Cissรฉ, Abdoulatif, Evangelopoulos, Xenophon, Gusev, Vladimir V., Cooper, Andrew I.
Many important scientific problems involve multivariate optimization coupled with slow and laborious experimental measurements. These complex, high-dimensional searches can be defined by non-convex optimization landscapes that resemble needle-in-a-haystack surfaces, leading to entrapment in local minima. Contextualizing optimizers with human domain knowledge is a powerful approach to guide searches to localized fruitful regions. However, this approach is susceptible to human confirmation bias and it is also challenging for domain experts to keep track of the rapidly expanding scientific literature. Here, we propose the use of Large Language Models (LLMs) for contextualizing Bayesian optimization (BO) via a hybrid optimization framework that intelligently and economically blends stochastic inference with domain knowledge-based insights from the LLM, which is used to suggest new, better-performing areas of the search space for exploration. Our method fosters user engagement by offering real-time commentary on the optimization progress, explaining the reasoning behind the search strategies. We validate the effectiveness of our approach on synthetic benchmarks with up to 15 independent variables and demonstrate the ability of LLMs to reason in four real-world experimental tasks where context-aware suggestions boost optimization performance substantially.
Efficient Cutting Tool Wear Segmentation Based on Segment Anything Model
Li, Zongshuo, Huo, Ding, Meurer, Markus, Bergs, Thomas
Tool wear conditions impact the surface quality of the workpiece Tool wear is an inevitable phenomenon in the actual machining and its final geometric precision. In this research, we process. It leads to alterations in the cutting zone's process propose an efficient tool wear segmentation approach based on variables like the forces and temperatures exerted on both the tool Segment Anything Model, which integrates U-Net as an automated and workpiece. These conditions not only influence the rate of prompt generator to streamline the processes of tool wear tool wear but also affect the surface quality and geometric precision detection. Our evaluation covered three Point-of-Interest generation of the workpiece [1]. Therefore, tool wear is one of the methods and further investigated the effects of variations in key determinants of both tool costs and the quality of the finished training dataset sizes and U-Net training intensities on resultant workpiece, emphasizing the necessity for monitoring during the wear segmentation outcomes. The results consistently highlight machining process to ensure optimal outcomes [1, 2].
An LLM Feature-based Framework for Dialogue Constructiveness Assessment
Zhou, Lexin, Farag, Youmna, Vlachos, Andreas
Research on dialogue constructiveness assessment focuses on (i) analysing conversational factors that influence individuals to take specific actions, win debates, change their perspectives or broaden their open-mindedness and (ii) predicting constructive outcomes following dialogues for such use cases. These objectives can be achieved by training either interpretable feature-based models (which often involve costly human annotations) or neural models such as pre-trained language models (which have empirically shown higher task accuracy but lack interpretability). We propose a novel LLM feature-based framework that combines the strengths of feature-based and neural approaches while mitigating their downsides, in assessing dialogue constructiveness. The framework first defines a set of dataset-independent and interpretable linguistic features, which can be extracted by both prompting an LLM and simple heuristics. Such features are then used to train LLM feature-based models. We apply this framework to three datasets of dialogue constructiveness and find that our LLM feature-based models significantly outperform standard feature-based models and neural models, and tend to learn more robust prediction rules instead of relying on superficial shortcuts (as seen with neural models). Further, we demonstrate that interpreting these LLM feature-based models can yield valuable insights into what makes a dialogue constructive.
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Liang, Weixin, Izzo, Zachary, Zhang, Yaohui, Lepp, Haley, Cao, Hancheng, Zhao, Xuandong, Chen, Lingjiao, Ye, Haotian, Liu, Sheng, Huang, Zhi, McFarland, Daniel A., Zou, James Y.
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.
Vector Quantile Regression on Manifolds
Pegoraro, Marco, Vedula, Sanketh, Rosenberg, Aviv A., Tallini, Irene, Rodolร , Emanuele, Bronstein, Alex M.
Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate measurements), tori (dihedral angles in proteins), or Lie groups (attitude in navigation). By leveraging optimal transport theory and the notion of $c$-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through preliminary synthetic data experiments.
How to disagree well: Investigating the dispute tactics used on Wikipedia
de Kock, Christine, Stafford, Tom, Vlachos, Andreas
Disagreements are frequently studied from the perspective of either detecting toxicity or analysing argument structure. We propose a framework of dispute tactics that unifies these two perspectives, as well as other dialogue acts which play a role in resolving disputes, such as asking questions and providing clarification. This framework includes a preferential ordering among rebuttal-type tactics, ranging from ad hominem attacks to refuting the central argument. Using this framework, we annotate 213 disagreements (3,865 utterances) from Wikipedia Talk pages. This allows us to investigate research questions around the tactics used in disagreements; for instance, we provide empirical validation of the approach to disagreement recommended by Wikipedia. We develop models for multilabel prediction of dispute tactics in an utterance, achieving the best performance with a transformer-based label powerset model. Adding an auxiliary task to incorporate the ordering of rebuttal tactics further yields a statistically significant increase. Finally, we show that these annotations can be used to provide useful additional signals to improve performance on the task of predicting escalation.
Yes-Yes-Yes: Proactive Data Collection for ACL Rolling Review and Beyond
Dycke, Nils, Kuznetsov, Ilia, Gurevych, Iryna
The shift towards publicly available text sources has enabled language processing at unprecedented scale, yet leaves under-serviced the domains where public and openly licensed data is scarce. Proactively collecting text data for research is a viable strategy to address this scarcity, but lacks systematic methodology taking into account the many ethical, legal and confidentiality-related aspects of data collection. Our work presents a case study on proactive data collection in peer review -- a challenging and under-resourced NLP domain. We outline ethical and legal desiderata for proactive data collection and introduce "Yes-Yes-Yes", the first donation-based peer reviewing data collection workflow that meets these requirements. We report on the implementation of Yes-Yes-Yes at ACL Rolling Review and empirically study the implications of proactive data collection for the dataset size and the biases induced by the donation behavior on the peer reviewing platform.
Special Issue! Foundational Algorithms, Where They Came From, Where They're Going
Years ago, I had to choose between a neural network and a decision tree learning algorithm. It was necessary to pick an efficient one, because we planned to apply the algorithm to a very large set of users on a limited compute budget. I went with a neural network. I hadn't used boosted decision trees in a while, and I thought they required more computation than they actually do -- so I made a bad call. Fortunately, my team quickly revised my decision, and the project was successful. This experience was a lesson in the importance of learning, and continually refreshing, foundational knowledge. If I had refreshed my familiarity with boosted trees, I would have made a better decision.